provably good batch reinforcement learning
Review for NeurIPS paper: Provably Good Batch Reinforcement Learning Without Great Exploration
Weaknesses: I also feel that the paper could have benefited from a discussion of these as compared to just outrightly saying that existing methods do not give us good results. In particular, the conditions under which existing methods work vs do not work should have been discussed more explicitly than what it is right now in the paper. Moreover, I think the experiments on cartpole and hopper are not indicative of their method's performance since these have determnisitc dynamics and the dataset was collected as trajectories (so s' is as frequent as s in the distribution \mu, see my point below) and hence their choice of masking reduces to action conditioned masking only. Some other questions that I have: - From the analysis perspective, the paper says that prior works such as Kumar et al. 2019 that use action conditional and concentrability do not get the same error rate. Is the main issue behind this limitation that the notion of concentrability used in Kumar et al. and other works is trajectory centric and not on the state-action marginal?
Review for NeurIPS paper: Provably Good Batch Reinforcement Learning Without Great Exploration
This is a nice paper, with a new idea and strong theoretical backing. The reviews, rebuttal and discussion periods led to a lot of detailed feedback, so I'd encourage the authors to include as much of this as possible in the camera-ready version, and specifically revise for clarity around the points that were unclear to the reviewers in the first submission.